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ABSTRACT 

DRAMs  are  widely  used  in  portable  applications  due  to  their 
high  storage  density.  In  standby  mode,  their  main  source  of 
power  dissipation  is  the  refresh  operation  that  periodically  re¬ 
stores  leaking  charge  in  each  cell  to  its  correct  level.  Conven¬ 
tional  DRAMs  use  a  single  refresh  period  determined  by  the 
cell  with  the  largest  leakage.  This  approach  is  simple  but  dis¬ 
sipative,  because  it  forces  unnecessary  refreshes  for  the  ma¬ 
jority  of  the  cells  with  small  leakage. 

In  this  paper  we  investigate  a  novel  scheme  that  relies 
on  multiple  refresh  periods  and  small  refresh  blocks  to  re¬ 
duce  DRAM  dissipation  by  decreasing  the  number  of  cells 
refreshed  too  often.  Long  periods  are  used  to  accommo¬ 
date  cells  with  small  leakage.  In  contrast  to  conventional 
row-based  refresh,  small  refresh  blocks  are  used  to  increase 
worst-case  data  retention  times.  Retention  times  are  further 
extended  by  adding  a  swap  cell  to  each  refresh  block. 

We  give  a  novel  polynomial-time  algorithm  for  comput¬ 
ing  an  optimal  set  of  refresh  periods  for  block-based  multi¬ 
period  refresh.  Specifically,  given  an  integer  K  and  a  dis¬ 
tribution  of  data  retention  times,  in  0(KN 2)  steps  our  al- 
orithm  computes  K  refresh  periods  that  minimize  DRAM 
issipation,  where  N  is  the  number  of  refresh  blocks  in  the 
memory.  We  describe  and  evaluate  a  possible  implementation 
of  our  refresh  scheme.  In  simulations  with  a  loMb  DRAM, 
block-based  multi-rate  refresh  reduces  standby  dissipation  by 
a  multiplicative  factor  of  4  with  area  overhead  below  6%. 

I  INTRODUCTION 

Main  memories  are  typically  built  using  Dynamic  Random 
Access  Memory  (DRAM)  technology.  To  maintain  their 
stored  charge  levels,  DRAM  cells  require  periodic  refreshing. 
Thus,  overall  DRAM  dissipation  depends  on  the  frequency  of 
the  refresh  waveform. 


Figure  1 :  Distribution  of  data-retention  times  of  DRAM  cells. 

Conventional  DRAMs  use  a  single  refresh  waveform 
whose  period  is  dictated  by  the  minimum  data-retention  time 
of  the  memory  cells.  Figure  1  shows  a  typical  distribution  of 
data  retention  times  in  DRAMs.  To  ensure  that  the  state  of  all 
cells  is  correct  at  any  time,  the  refresh  period  Iref  must  not 
exceed  the  shortest  data-retention  time  tRET .  For  the  major¬ 
ity  of  the  cells,  however,  data  retention  times  are  significantly 
longer  than  tREF  ■  Refreshing  these  cells  with  a  period  tREF 


is  therefore  unnecessary  and  results  in  excessive  power  dissi¬ 
pation. 

In  this  paper  we  investigate  a  novel  scheme  for  reducing 
data-retention  power  in  DRAMs  by  using  multiple  refresh  pe¬ 
riods  and  refreshing  on  a  block  basis.  Through  the  introduc¬ 
tion  of  multiple  refresh  periods,  cells  with  long  data-retention 
times  can  be  refreshed  less  often  than  leaky  cells  with  short 
ones.  Moreover,  since  the  number  of  leaky  cells  is  typically 
small  [12],  by  refreshing  cells  in  relatively  small  groups,  as 
opposed  to  entire  rows,  the  refresh  period  of  each  block  is 
increased.  To  further  extend  the  refresh  periods  of  refresh 
blocks,  a  swap  cell  is  introduced  to  provide  a  replacement  for 
the  most  leaky  cell  in  each  block. 

To  accompany  our  scheme,  we  give  a  novel  polynomial- 
time  algorithm  for  computing  an  optimal  set  or  refresh  pe¬ 
riods  that  minimizes  refresh  power  dissipation.  Specifically, 
given  an  integer  K  and  the  data  retention  times  of  the  refresh 
blocks  in  the  memory,  our  algorithm  determines  in  0(KN 2) 
steps  a  set  of  K  refresh  periods  that  minimize  the  power  dis¬ 
sipation  due  to  refreshing.  In  addition  to  this  algorithm,  we 
describe  a  practical  implementation  of  our  scheme  and  evalu¬ 
ate  its  effectiveness  in  reducing  DRAM  power  dissipation.  In 
simulations  of  a  16Mb  DRAM,  our  block-based  muiti-period 
refresh  reduces  power  dissipation  by  a  multiplicative  factor  of 
4  with  an  area  overhead  of  at  most  6%. 

Numerous  approaches  have  been  investigated  for  reducing 
data  retention  power  in  DRAMs.  Schemes  that  reduce  leak¬ 
age  current  by  optimizing  process  conditions  or  by  control¬ 
ling  the  potential  of  various  nodes  in  the  cell  have  been  re¬ 
ported  in  [8,  3].  Schemes  for  dynamically  controlling  the 
refresh  period  through  a  limited  number  of  temperature  or 
current  sensors  have  been  reported  in  [11,  14].  The  use  of 
memory  access  history  for  reducing  the  number  of  refreshes 
to  previously  accessed  rows  has  been  proposed  in  [13].  The 
use  of  Error  Correcting  Codes  (ECCs)  to  control  the  number 
of  errors  below  a  required  level  while  setting  the  refresh  pe¬ 
riod  to  a  higher  value  was  reported  for  mass  storage  media 
in  [9].  The  introduction  of  a  second  refresh  period  for  rows 
with  long  data  retention  times  was  proposed  in  [6,  2].  These 
papers  focus  on  implementation  issues,  however,  and  do  not 
present  systematic  techniques  for  the  optimal  selection  of  the 
second  period.  A  multi-period  refresh  scheme  that  relies  on 
ECCs  to  extend  the  refresh  period  was  proposed  in  [7],  along 
with  an  exponential-time  algorithm  for  the  optimal  selection 
of  the  multiple  periods. 

The  remainder  of  this  paper  has  six  sections.  Section  II 
gives  an  overview  of  refresn  m  conventional  DRAM.  The  pro¬ 
posed  block-based  multi-period  refresh  scheme  is  described 
in  Section  III.  Our  polynomial-time  algorithm  for  computing 
a  set  of  optimal  refresh  periods  is  presented  in  Section  Iv. 
Section  V  describes  the  architecture  of  the  proposed  block- 
based  multi-period  memory.  Section  VI  presents  simulation 
results  from  the  application  of  our  methodology  to  a  16Mb 
memory.  We  conclude  our  paper  in  Section  Vu  with  a  brief 
discussion  of  future  work. 


II  REFRESH  IN  CONVENTIONAL  DRAM 

Typical  DRAM  cells  are  composed  of  one  transistor  and  one 
capacitor.  Charge  stored  in  the  cell  capacitor  degrades  over 
time  due  to  leakage  currents.  Correct  state  retention  is  en¬ 
sured  by  periodic  recharging  (or  refreshing)  of  the  stored 
charge.  Among  many  known  leakage  currents,  the  junction 
leakage  current  from  the  storage  node  is  known  to  be  the  ma¬ 
jor  leakage  mechanism  [12].  Due  to  local  process  variations, 
it  varies  among  cells,  resulting  in  fluctuations  of  tRET  among 
the  cells  [15,  5].  Studies  have  showed  that  the  log2(tREr)  of 
the  cells  follows  a  bimodal  distribution,  akin  to  the  one  shown 
in  Figure  1.  The  main  distribution,  which  comprises  the  ma- 

a  of  the  cells,  is  composed  of  cells  with  long  A 

tail  distribution  is  composed  of  leaky  cells  with  short 
tRET  [12]. 


Figure  2:  Memory  with  16  cells,  (a)  Data  retention  times  and 
(b)  refresh  pattern  using  conventional  refresh. 


Conventional  DRAMs  use  a  single  refresh  period  tREF  for 
every  cell.  To  prevent  leakage-inauced  errors,  tREF  must  be 
set  with  respect  to  the  shortest  tRET  in  the  memory  array.  For 
example,  consider  the  16-cell  memory  shown  in  Figure  2(a), 
where  the  integers  inside  the  cells  denote  their  data  retention 
times  tRET-  in  this  case,  tREF  must  not  exceed  2.  More¬ 
over,  all  cells  are  refreshed  at  this  rate,  as  indicated  by  their 
uniform  shading  in  Figure  2(b).  Consequently,  most  cells  are 
refreshed  too  often,  resulting  in  excessive  power  dissipation. 

The  data  retention  power  Pret  is  given  by  the  equation 


Pret  = 


AC 

tREF 


+  Pc 


const  — 


AC 

tREF  ’ 


(1) 


where  C  is  the  total  switching  capacitance,  A  is  a  constant 
proportionality  factor,  and  PCOnst  is  a  power  component  in¬ 
dependent  of  tREF  and  typically  less  than  10%  of  total  dissi¬ 
pation  [6]. 

ID  BLOCK-BASED  MULTI-PERIOD  REFRESH 

In  this  section  we  describe  our  proposed  scheme  for  reducing 
data-retention  power  by  introducing  additional  refresh  peri¬ 
ods  and  refreshing  on  a  block  basis.  In  our  scheme,  groups 
of  adjacent  cells,  called  refresh  blocks ,  are  refreshed  at  the 
same  period.  The  necessary  refresh  period  of  each  block  i, 
denoted  by  tiy  is  determined  by  the  minimum  tRET  of  the 
cells  in  the  block.  Energy  dissipation  is  reduced  by  refresh¬ 
ing  blocks  with  long  U  using  an  accordingly  long  period  <f>i 
such  that  <f>i  <  t{.  Since  the  data  retention  times  of  a  few 
leaky  cells  are  orders  of  magnitude  shorter  than  those  of  most 
cells  [12],  the  necessary  refresh  period  of  each  block  can  be 
extended  by  decreasing  the  size  of  the  blocks  and  by  adding 
a  few  swap  cells  to  replace  leaky  cells  after  fabrication. 

III-A  Extension  of  U 

The  effect  of  the  basic  block-based  multiple-period  refresh 
(BM)  scheme  on  the  necessary  refresh  periods  of  the  exam¬ 
ple  memory  from  Figure  2  is  shown  in  Figure  3.  In  Figure 
5(a),  each  row  is  treated  as  a  single  refresh  block.  In  this  case, 
only  one  row  requires  a  refresh  period  of  2.  The  other  three 
rows  can  be  refreshed  with  longer  periods.  The  magnitudes  of 
the  tf  s  are  encoded  in  the  cb&Ging  of  the  refresh  blocks,  with 


(a)  (b) 


Figure  3:  Extension  of  necessary  refresh  periods  using  BM 
with  (a)  4-cell  blocks  and  (b)  2-cell  blocks. 


lighter  shading  indicating  longer  tf  s.  In  Figure  3(b),  each 
row  is  divided  into  two  refresh  blocks.  As  evidenced  by  the 
increase  in  the  number  of  cells  with  light  shading,  the  num¬ 
ber  of  refresh  blocks  with  longer  U  increases  as  block  size 
decreases.  Only  one  block  requires  a  refresh  period  of  2  in 
this  case,  with  the  tf  s  of  the  remaining  seven  blocks  satisfy¬ 
ing^  >  4. 

The  necessary  refresh  periods  U  can  be  further  extended 
by  using  one  or  more  swap  cells  per  refresh  block.  These 
cells  are  used  to  store  data  which  would  be  otherwise  stored 
in  leaky  cells  in  the  block.  If  the  tRET  of  the  swap  cell  is 
longer  than  that  of  the  leaky  cell,  by  routing  the  input  and 
the  output  of  the  worst-case  cell  to  these  of  the  swap  cell  the 
correct  data  storage  is  guaranteed  while  extending  U. 


(a)  (b) 

Figure  4:  Extension  of  necessary  refresh  periods  with  eBM 
using  one  swap  cell  and  (a)  4-cell  blocks  or  (b)  2-cell  blocks. 

The  impact  of  the  enhanced  BM  (eBM)  scheme,  using 
one  additional  swap  cell  per  refresh  block,  is  shown  in  Figure 
4.  In  this  figure,  cells  that  are  swapped  with  the  additional 
cell  are  left  unshaded.  In  comparison  with  the  original  mem¬ 
ory  in  Figure  3,  eBM  increases  the  tf  s  of  the  blocks,  thus 
increasing  the  possible  <pfs,  As  block  size  decreases,  tf  s  in¬ 
crease  further,  at  the  cost  of  additional  area  overhead  due  to 
the  added  swap  cells.  When  each  row  is  divided  into  two 
blocks,  one  block  requires  a  refresh  period  of  2,  while  for  the 
remaining  seven  blocks,  the  necessary  refresh  period  is  5  or 
longer,  if  the  tRET  of  the  swap  cell  is  the  shortest  one  in  the 
block,  then  swapping  reduces  U  of  the  block.  Since  the  proba¬ 
bility  of  leaky  cell  being  the  swap  cell  is  very  small,  however, 
its  impact  on  overall  U  extension  is  negligible. 

ffl-B  Dependency  of  power  on  (pi  selection 

If  the  number  of  refresh  periods  is  unlimited,  then  for  each 
block  i ,  (pi  can  be  set  equal  to  U.  Since  the  number  of  refresh 
periods  is  physically  limited,  however,  the  (pi  of  each  blocks 
must  satisfy  the  inequality  (pi  <U. 


(a)  (b) 


Figure  5:  Assignment  of  refresh  periods  from  (a)  the  set 
{2, 5}  and  (b)  the  set  {2, 8}. 

Figure  5  shows  the  assignment  of  two  refresh  periods  with 
each  memory  row  dividecfinto  two  refresh  blocks.  When  re- 


fresh  periods  are  selected  from  the  set  {2, 5},  the  two  blocks 
with  U  equal  to  2  and  4  must  be  refreshed  with  period  2,  while 
the  remaining  six  with  period  5.  When  refresh  periods  are  se¬ 
lected  from  the  set  {2,8},  the  four  blocks  with  ti  equal  to  2, 
4, 5,  and  6  must  be  refreshed  with  period  2,  and  the  other  four 
blocks  can  be  refreshed  with  period  8. 

Total  data  retention  power  depends  on  the  selection  of  re¬ 
fresh  periods.  To  achieve  maximum  power  savings,  an  op¬ 
timal  set  of  refresh  periods  must  be  computed.  Revisiting 
Eq.  1,  the  data-retention  power  Pret  of  a  memory  that  uses 
block-based  multi-period  refresh  with  each  block  i  compris¬ 
ing  b  cells  refreshed  at  an  assigned  refresh  period  fa  is  given 
by 


MR  ^ 

Pret  =  A  ■  c  •  b  ■  ^  —  ,  (2) 

i- 1 

where  M  is  the  number  of  blocks  per  row,  R  is  the  total  num¬ 
ber  of  rows  in  the  array,  c  is  the  switching  capacitance  for 
refreshing  one  bit,  and  A  is  a  constant  proportionality  factor. 
Using  this  equation,  it  follows  that  for  the  memory  in  Figure 
5,  the  refresh  period  set  {2, 5}  results  in  lower  data-retention 
power. 

IV  ALGORITHM  FOR  SELECTING  OPTIMAL  PE¬ 
RIODS 

In  this  section  we  describe  a  polynomial-time  algorithm  that 
computes  a  set  of  optimal  refresh  periods  for  block-based 
multi-period  refresh,  given  the  size  of  the  set  and  the  data 
retention  times  of  the  refresh  blocks  in  memory. 


Figure  6:  Distribution  of  necessary  refresh  times. 

The  disfribution  of  the  data  retention  times  can  be  deter¬ 
mined  during  post-fabrication  testing  of  the  memory  arrays. 
The  data  retention  characteristics  of  the  cells  are  first  mea¬ 
sured  and  then  used  in  bypassing  the  rows  containing  the  cells 
with  short  tRET *  (This  approach  was  also  used  in  [6]  to  as¬ 
sign  rows  to  one  of  the  two  refresh  periods.)  Subsequently, 
they  are  processed  further  before  they  are  provided  as  input 
to  our  proposed  algorithm.  Specifically,  as  shown  in  Figure 
6,  the  space  tRET  space  is  divided  into  L  bins.  Each  bin 
i  contains  Bi  blocks  that  must  be  refreshed  at  a  period  no 
longer  than  Ti.  As  we  discuss  in  Section  V,  the  area  over¬ 
head  of  our  scheme  is  kept  low  if  the  periods  $i ,  $2»  •  •  - ,  $k 
selected  by  the  algorithm  are  integral  multiples  of  the  basic 
refresh  period.  This  property  of  the  solution  can  be  ensured 
by  selecting  the  bins  at  periods  that  are  integral  multiples  of 
the  original  period.  The  inputs  to  the  algorithm  are  the  re¬ 
sulting  distribution  DIST.L  and  the  total  number  of  refresh 
periods  K. 

Since  K  <  L,  in  general,  there  exist  bins  j  whose  blocks 
will  be  refreshed  at  a  period  shorter  than  Tj.  The  power  as¬ 
sociated  with  refreshing  blocks  in  bins  i  H- 1  through  i  with  a 
period  Ti+i  is  the  sum  of  the  power  dissipation  in  all  blocks 
of  these  bins  and  is  given  by  the  expression 


Pi+l,j  — 


D 
Ti+ 1 


E  * 

n=i-f  1 


(3) 


where  D  is  constant  proportionality  factor.  Given  a  selection 
of  K  refresh  periods,  the  total  data-retention  power  is  the  sum 
of  these  partial  powers  up  to  bin  L.  Among  the  lCr  possible 
combinations,  we  need  the  compute  the  one(s)  for  which  total 
power  dissipation  is  minimized. 

The  key  to  obtaining  an  efficient  algorithm  for  the  overall 
optimization  problem  is  to  express  optimal  solutions  to  sub¬ 
problems  in  a  recursive  manner.  Dynamic  programming  can 
then  be  used  to  solve  the  recursion  in  polynomial  steps.  Let 
Pjt k  be  the  minimum  power  required  to  refresh  the  blocks  in 
the  first  j  bins  using  k  periods.  As  can  be  verified  with  the 
help  of  Figure  6,  the  optimal  dissipation  Pj tk  is  obtained  by 
introducing  an  additional  refresh  period  T;+i  to  the  optimal 
solution  obtained  with  fc  -  1  periods  up  to  some  bin  i.  This 
fact  is  captured  by  the  following  recursion. 

pi,k  =  min  j  x)  +  =?-  £  ■  (4> 

3  l  Ji+1n=i+l  J 

Since  the  minimum  total  power  is  required,  the  period  Ti+1 
that  minimizes  P,,*  is  selected  as  the  fcth  period. 

2:  5o,o  4—  0 

3:  for  l  =  1  to  L  do 

4:  accmi  «—  accm/_i  +  B\ 

5:  Piti  4—  D  ‘  accmi /T\ 

6:  «—  {Ti} 

7:  end  for 
8:  for  &  =  2  to  K  do 
9:  Po,*-i  0 

10:  *So,*-i  <-  5o,*-2  U  {Ti} 

11:  forj==ltoLdo 

12:  *j,k  4~~  oo 

13:  for  i  =  0  to  j  -  1  do 

14:  partP  «—  D  •  ( accmj  -  accm^ /Ti+\ 

15:  if  Pj^  >  Pi,k-i  +  partP  then 

16:  Pjik  <-  Pi,*_ i  +  partP 

17:  Sjtk  Siyk- i  U  {Tj+i} 

18:  end  if 

19:  end  for 

20:  end  for 

21:  end  for 
22:  return  Sl,k 

Figure  7:  Algorithm  for  computing  optimal  set  of  periods. 

Figure  7  gives  pseudocode  for  solving  the  recursion  in 
Equation  4.  The  variable  Sj)k  maintains  the  optimal  set  of 
k  periods  minimizing  When  the  algorithm  terminates, 
the  K  optimal  periods  are  found  in  Sl^r.  The  compute  in¬ 
tensive  part  of  the  algorithm  is  the  three  nested  loops,  and  its 
total  runtime  is  0(KL 2)  =  0(KN 2),  since  L  <  N. 

V  ARCHITECTURAL  ORGANIZATION 

In  this  section  we  describe  a  hardware  organization  that  im¬ 
plements  block-based  multi-period  refresh.  Our  architecture 
uses  clock  periods  that  are  integral  multiples  of  the  shortest 
period,  allowing  the  blocks  with  longer  assigned  period  to 
skip  refresh  until  their  period  arrives.  Such  skipping  was  also 
used  in  [6]  to  implement  two-period  refresh. 

Figure  8  gives  a  schematic  description  of  the  proposed  ar¬ 
chitecture.  The  refresh  period  signals  are  generated  in  the 
refresh  period  generator.  For  each  clock,  the  assigned  refresh 


Figure  8:  Schematic  description  of  BM  DRAM  architecture. 
Shaded  modules  constitute  conventional  DRAM. 


period  is  mapped  in  the  refresh  class  memory.  The  stored  pe¬ 
riod  is  compared  with  the  generated  period  each  time  the  row 
containing  the  block  is  addressed  for  refresh.  If  they  agree, 
block  refresh  is  enabled  by  the  refresh  enabler.  The  dotted 
blocks  need  to  be  added  to  support  the  swap  cell  scheme. 
When  refreshing  of  a  block  is  enabled,  the  pointer  to  the  cell 
to  be  swapped  is  read  from  the  bit  pointer  memory,  and  the 
data  are  swapped  by  the  bit  selector.  In  the  following  subsec¬ 
tions,  we  give  a  more  detailed  description  of  our  architecture 
for  memories  configured  with  R  rows,  M  blocks  per  row,  b 
cells  per  block,  one  swap  cell  per  block,  and  K  refresh  peri¬ 
ods. 

V-A  Implementation  of  BM 

Due  to  process  variations,  every  memory  will  have  different 
leakage  current  distribution.  Hence,  to  guarantee  maximum 
power  reduction  for  every  memory,  the  refresh  period  gener¬ 
ator  must  be  programmable. 


Figure  9:  Refresh  signal  generator. 

Figure  9  describes  a  refresh  period  generator.  It  is  com¬ 
posed  of  K  -1  counters  of  programmable  period  to  generate 
K  -  1  additional  refresh  penods,  added  to  the  original  period 
generated  from  the  refresh  address  counter.  The  counters  are 
incremented  each  time  the  refresh  address  counter  completes 
a  cycle,  thus  generating  integral  multiples  of  the  original  pe¬ 
riod.  The  cycle  period  of  each  counter  can  be  programmed  to 
one  of  the  added  periods  by  programming  its  reset  condition. 
This  is  accomplished  by  connecting  an  output  of  the  d  :  d2 
decoder,  which  is  asserted  once  every  period,  to  the  counter 
reset  by  means  of  fuses.  The  output  of  the  refresh  period  gen¬ 
erator  is  a  K- bit  signal  indicating  whether  the  current  cycle  is 
the  programmed  period  or  not  for  the  counters. 

Information  about  the  refresh  period  assigned  to  each  block 
is  stored  in  the  refresh  class  memory.  To  minimize  memory 
size,  the  actual  information  stored  in  the  refresh  memory  is 
not  the  period  itself  but  the  pointer  to  the  counter  in  the  refresh 
signal  generator  that  generates  the  period.  Thus,  its  total  size 
is  R  x  M  x  log2K.  The  refresh  class  memory  is  indexed 
by  the  refresh  address  and  outputs  a  pointer  of  log2K  bits  for 
each  block. 

Block  refreshing,  which  should  be  performed  only  when 
the  current  period  matches  the  assigned  period,'  is  enabled  by 
the  refresh  enabler.  As  shown  in  Figure  10,  the  refresh  en¬ 
abler  is  a  simple  array  of  K  :  1  multiple;  o  s.  The  input  to  the 


Figure  10:  Refresh  enabler 

multiplexors  is  a  K- bit  refresh  signal  from  the  refresh  sig¬ 
nal  generator,  and  its  control  signal  is  loa2K- bit  refresh  class 
from  the  refresh  class  memory.  Thus,  the  refresh  of  a  block 
is  enabled  only  when  its  corresponding  refresh  signal  is  as¬ 
serted. 

The  word  lines  and  sense-enable  lines  in  the  memory  array 
are  divided  into  sub-lines  coveripg  each  block.  The  refresh 
of  the  blocks  is  enabled  only  when  both  the  row  containing 
the  block  is  addressed  for  refresh  and  the  refresh  of  the  block 
is  enabled.  A  similar  approach  of  dividing  the  word  lines  is 
proposed  in  [10]  to  reduce  the  switching  capacitance  in  mem¬ 
ory. 

V-B  Implementation  of  eBM 

The  position  of  the  cell  to  be  swapped  for  each  block  is  stored 
in  the  bit  pointer  memory.  To  reduce  power  consumption,  the 
bit  pointer  memory  is  read  only  when  the  refresh  of  the  block 
is  enabled.  This  is  accomplished  by  dividing  the  word  lines  to 
blocks  and  activating  them  only  when  the  refresh  of  the  block 
is  enabled.  The  bit  pointer  memory  is  indexed  by  the  refresh 
address  and  outputs  a  log2b-bits  pointer.  Thus,  its  total  size  is 
R  x  M  x  loa2b. 

The  refresh  on  the  bad  cell  is  redirected  to  the  swap  cell 
by  the  bit  selector.  The  bit  selector  of  a  block  consists  of  a 
log2b:b  decoder  and  b  bidirectional  multiplexors. 


Figure  1 1:  Bit  selector  array 

As  shown  in  Figure  11,  the  decoder  takes  the  bit  pointer 
from  the  bit  pointer  memory  as  input  and  selects  the  multi¬ 
plexor  of  the  cell  to  be  swapped,  thus  redirecting  the  access 
on  the  bit  line  of  the  bad  cell  to  that  of  the  swap  cell. 

The  extra  memory  cells  required  for  employing  eBM  with 
one  swap  cell  per  block  is  R  x  M  x  1. 

VI  SIMULATION  RESULTS 

We  have  evaluated  the  effectiveness  of  block-based  multi¬ 
period  refresh  on  a  16Mb  DRAM  fabricated  in  0.6/ira  tech¬ 
nology  whose  distribution  of  data  retention  times  is  given  in 
[12].  The  data-retention  power  of  this  memory  in  the  self- 
refresh  mode  is  5.5m W  at  a  refresh  period  of  64ms  [1].  For 
our  block-based  multi-period  refresh  scheme  the  power  dissi¬ 
pation  and  the  area  overhead  of  the  added  logic  modules  was 
computed  using  the  Epoch  CAD  tool  at  a  0.7 am  standard 
cell  technology.  The  power  dissipation  of  the  added  memory 
modules  was  estimated  by  scaling  the  power  dissipation  of  the 
16Mb  memory.  Our  simulation  results  show  that block-based 
multi-period  refresh  can  result  in  significant  savings  in  data- 
retention  power  with  minimal  area  overhead.  In  this  section 
we  present  our  results  and  describe  the  configurations  achiev¬ 
ing  minimum  power. 


VI-A  Data-retention  power 


(a)  (b) 

Figure  12:  Pret  reduction  ( PConv/PBM\eBM )  versus  block 
size  and  number  of  allowed  refresh  period  for  (a)  BM  and 
(b)  eBM 

Figure  12  shows  the  relative  power  reductions  of  BM  and 
eBM  over  conventional  single-period  refresh.  Data  retention 
power  decreases  by  up  to  a  factor  of  4.  Our  simulations  took 
into  account  both  the  frequency  dependent  and  the  frequency 
independent  component  of  memory  power  dissipation.  With  a 
fixed  number  of  refresh  periods,  block  size  has  a  convex  effect 
on  power  reduction,  resulting  in  maximum  power  reduction  in 
the  range  of  about  one  hundred  cells  per  block.  For  smaller 
block  sizes,  the  power  overhead  of  the  additional  memories 
overtakes  the  benefits  of  longer  retention  times.  With  a  fixed 
block  size,  power  reduction  reaches  an  asymptotic  maximum 
as  the  number  of  refresh  periods  increases.  In  general,  eBM 
has  similar  behavior  with  BM  and  achieves  larger  maximum 
power  reduction  at  larger  block  sizes. 


Figure  13:  Difference  in  relative  power  reduction  between 
BM  and  eBM  ( Pcotiv/Pbm  “  P zonv  / PeBM )• 

Figure  13  shows  the  difference  in  the  reduction  of  data- 
retention  power  using  BM  and  eBM.  The  introduction  of 
swap  cell  is  beneficial  for  larger  block  sizes,  due  to  the  longer 
(pi's.  eBM  consumes  more  power  when  block  sizes  are 
small,  since  the  reduction  in  the  data  retention  power  is  offset 
by  the  power  consumed  in  the  large  bit  pointer  memory. 

VI-B  Area  overhead 


In  this  section  we  discuss  the  area  overhead  of  our  proposed 
scheme,  using  transistor  counts  as  the  measure. 


(a)  (b) 


Figure  14:  Relative  area  overhead  (extra  tr.  /  16M  tr.)  versus 
b^::k  size  and  number  of  allowed  refresh  period  for  (a)  BM 
and  (b)  eBM 


The  area  overhead  of  our  scheme  is  shown  in  Figure  14. 
Area  overhead  decreases  as  the  block  size  increases.  It  in¬ 
creases  as  the  number  of  refresh  periods  increases.  The  area 
overhead  of  eBM  increases  drastically  as  the  block  size  de¬ 
creases  due  to  the  bit  pointer  memory.  It  still  stays  below 
10%  for  large  block  sizes.  Though  this  work  was  done  on 
16Mb  DRAM,  the  area  overhead  will  scale  linearly  with  the 
memory  size  as  discussed  in  Sec.  V. 

VI-G  Maximum  power  reduction  configuration 


The  memory  configurations  that  achieve  maximum  reduction 
in  data-retention  power  are  shown  in  Figure  15. 


Figure  15:  Optimum  configuration  achieving  minimum 
power  for  (a)  BM  and  (b)  eBM 

The  vertical  lines  denote  the  selected  refresh  periods.  As 
expected,  the  optimum  block  size  for  eBM  is  larger  than  that 
of  BM,  due  to  the  swapping  of  leaky  cells  which  reduces 
blocks  with  short  <p{ .  With  BM,  Pret  is  reduced  by  a  factor 
of  3.93  using  blocks  of  128  cells  and  12  refresh  periods.  The 
area  overhead,  in  transistor  count,  is  2.4%.  For  eBM,  Pret 
is  reduced  by  a  factor  of  4.23  using  blocks  of  256  cells  with  1 
swap  cell  per  block  and  10  refresh  periods.  The  area  overhead 
is  5.4%.  Except  for  a  few  short  refresh  periods,  the  rest  of  the 
periods  are  closely  placed  due  to  the  converging  (pi's . 

vn  CONCLUSION 

This  paper  describes  a  novel  block-based  multi-period  re¬ 
fresh  scheme  for  low  data-retention  power  in  dynamic  memo¬ 
ries.  A  novel  polynomial-time  algorithm  is  given  for  selecting 
an  optimal  set  of  refresh  periods  that  minimizes  data  reten¬ 
tion  power.  In  addition,  an  implementation  of  the  proposed 
scheme  is  described.  Simulation  results  with  a  16Mb  DRAM 
show  power  reductions  up  to  a  factor  of  4  over  conventional 
refresh,  with  an  area  overhead  less  than  6%.  We  are  currently 
evaluating  the  effect  of  process  variations  on  the  effectiveness 
of  our  scheme. 
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